Parallel Sampling of DP Mixture Models using Sub-Cluster Splits
نویسندگان
چکیده
We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed sampler enforces the correct stationary distribution of the Markov chain without the need for finite approximations. Empirical results illustrate that the new sampler exhibits better convergence properties than current methods.
منابع مشابه
Parallel Sampling of DP Mixture Models using Sub-Clusters Splits
We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed s...
متن کاملSupplemental Material for Parallel Sampling of DP Mixture Models using Sub-Clusters Splits
In this section, we show the derivation of the posterior distribution over cluster-weights, π, conditioned on the cluster labels, z. We begin with the definition of a Dirichlet process from [1]. Definition A.1 (Dirichlet Process). Let H be a measure on a measureable space, Ω. If for any finite partition, (A1, A2, · · · , AK) of the space, the measure, G, on the partition follows the following D...
متن کاملSampling in computer vision and Bayesian nonparametric mixtures
The field of computer vision focuses on understanding and reasoning about the visual world. Due to the complexity of this problem, researchers often focus on one specific component of this large task, such as segmentation or recognition. This modularized approach necessitates the combination of each separate component, which Bayesian formulations handle in a mathematically consistent framework....
متن کاملParallel Sampling of HDPs using Sub-Cluster Splits
We develop a sampling technique for Hierarchical Dirichlet process models. The parallel algorithm builds upon [1] by proposing large split and merge moves based on learned sub-clusters. The additional global split and merge moves drastically improve convergence in the experimental results. Furthermore, we discover that cross-validation techniques do not adequately determine convergence, and tha...
متن کاملDistributed Inference for Dirichlet Process Mixture Models
Bayesian nonparametric mixture models based on the Dirichlet process (DP) have been widely used for solving problems like clustering, density estimation and topic modelling. These models make weak assumptions about the underlying process that generated the observed data. Thus, when more data are collected, the complexity of these models can change accordingly. These theoretical properties often...
متن کامل